Log-linear models and latent semantic indexing applied to mwe identification
نویسنده
چکیده
A short introduction characterizes the task of identification of multiword expressions and their idiosyncratic properties. Then, this document gives a detailed description of loglinear models and latent semantic analysis. The description enumerates components of the models, estimation techniques for the model parameters and addresses the interpretation of the models and their evaluation. We also briefly report on how the models have been earlier used in identification of multiword expressions. Furthermore, we describe how these models can be used in R statistics package. A small case study reports on preliminary experiments on quantifying two linguistic attributes. We explore to what extent these attributes identify mwes and whether a statistical dependence between them can be captured by a loglinear model.
منابع مشابه
Automatic Identification Of Non-Compositional Multi-Word Expressions Using Latent Semantic Analysis
Making use of latent semantic analysis, we explore the hypothesis that local linguistic context can serve to identify multi-word expressions that have noncompositional meanings. We propose that vector-similarity between distribution vectors associated with an MWE as a whole and those associated with its constitutent parts can serve as a good measure of the degree to which the MWE is composition...
متن کاملHuman Expert Modelling Using Numerical Linear Algebra: a Heavy Industry Case Study
The article describes our experience with a method for an automatic identification of image semantic, which is applied to the coking plant Mittal Steel Ostrava, the Czech Republic. The image retrieval algorithm is based on Latent Semantic Indexing (LSI) and involves Singular Value Decomposition of a document matrix. Numerical experiments on a real data collection indicates feasibility of the he...
متن کاملModeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model
Non-substitutability is a property of Multiword Expressions (MWEs) that often causes lexical rigidity and is relevant for most types of MWEs. Efficient identification of this property can result in the efficient identification of MWEs. In this work we propose using distributional semantics, in the form of word embeddings, to identify candidate substitutions for a candidate MWE and model its sub...
متن کاملA Semantic Feature for Statistical Machine Translation
A semantic feature for statistical machine translation, based on Latent Semantic Indexing, is proposed and evaluated. The objective of the proposed feature is to account for the degree of similarity between a given input sentence and each individual sentence in the training dataset. This similarity is computed in a reduced vectorspace constructed by means of the Latent Semantic Indexing decompo...
متن کاملDetermining Compositionality of Word Expressions Using Word Space Models
This research focuses on determining semantic compositionality of word expressions using word space models (WSMs). We discuss previous works employing WSMs and present differences in the proposed approaches which include types of WSMs, corpora, preprocessing techniques, methods for determining compositionality, and evaluation testbeds. We also present results of our own approach for determining...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005